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Abstract — As an important application in today's busy world, 
mobile video conferencing facilitates people's virtual face-to- 
face communication with friends, families and colleagues, via 
their mobile devices on the move. However, how to provision 
high-quality, multi-party video conferencing experiences over 
mobile devices is still an open challenge. Our survey on 7 
representative mobile video conferencing applications shows that 
at most 2 — 4 concurrent participants can be supported in one 
conference. The fundamental problem behind is still a lack of 
computation and communication capacities on the mobile devices, 
to scale to large conferencing sessions. In this paper, we present 
vSkyConf, a cloud-assisted mobile video conferencing system, 
to fundamentally improve the quality and scale of multi-party 
mobile video conferencing. By novelty employing a surrogate 
virtual machine in the cloud for each mobile user, we allow fully 
scalable communication among the conference participants via 
their surrogates, rather than directly. The surrogates exchange 
conferencing streams among each other, transcode the streams 
to the most appropriate bit rates, and buffer the streams for 
the most efficient delivery to the mobile recipients. A fully 
decentralized, optimal algorithm is designed to decide the best 
paths of streams and the most suitable surrogates for video 
transcoding along the paths, such that the limited bandwidth 
is fully utilized to deliver streams of the highest possible quality 
to the mobile recipients. We also carefully tailor a buffering 
mechanism on each surrogate to cooperate with optimal stream 
distribution. Together they guarantee bounded, small end-to-end 
latencies and smooth stream playback at the mobile devices, in 
the video conferencing sessions. We have implemented vSkyConf 
based on Amazon EC2 and verified the excellent performance of 
our design, as compared to the widely adopted unicast solutions. 



I. Introduction 

Online video conferencing has been widely deployed for 
virtual, face-to-face communication among separate parties, 
as a greener solution to replace many of the energy-expensive 
conference travels. Advances in mobile and wireless commu- 
nication technologies have enabled mobile users to exploit new 
evolution of phone calls — mobile video conferencing calls 
— as part of their everyday life, anytime anywhere on the 
move. 

A number of mobile video conferencing applications have 
emerged HI El lEI El ■ Many rely on expensive, dedicated ar- 
chitectures, e.g., multiple control units (MCU), to process 
signaling messages, transcode ingress session streams and 
disseminate multiple streams to each end device. Such a 
centralized solution is limited in scalability, and the expensive 
up-front investment prohibits its wide adoption by small or 
medium institutions, let alone individual users. Distributed, 



TABLE I 

Representative Mobile Video Conferencing Apps : a Comparison 



App 


Structure 


Max. # of 
Participants 


Cellular Call 
Support 


FaceTime 


P2P 


2 


no 


LifeSize 


S/C 


4 


yes 


Skype 


P2P 


2 


yes 


Vidyo 


S/C 


4 


yes 


Fring 


P2P 


4 


yes 


Fuze 


S/C 


4 


yes 


Tango 


P2P 


2 


yes 



peer-to-peer (P2P) based mobile video conferencing solutions 
have also been deployed, e.g., Skype mobile |[T|, which lever- 
ages intermediate super nodes for session relays. 

Can the existing mobile video conferencing systems sup- 
port high-quality, multi-party video conferencing over mobile 
devices? We seek the answer by conducting a survey of 7 
representative applications, with results given in Table II We 
observe that applications with infrastructure support (S/C) tend 
to support more concurrent users under expensive user sub- 
scription fees 1211141, while P2P-based solutions are reluctant 
to allow group video calls, for a fear of compromising call 
qualities. Skype is believed to provide the most decent call 
quality, but it only supports two-way visual communication on 
mobile phones (while web-based Skype allows 10 concurrent 
communication sessions among premium user accounts), and 
so is Tango 13|. Most applications stick to one streaming 
rate; Skype and Fring recently declare Dynamic Video Quality 
(DVQ) by adapting video bit rates according to the transient 
connection conditions [51, but only a limited number of bit 
rates are supported, e.g., medium quality streams of 256 kbps 
and higher quality streams of 512 kbps in Skype. 

Based on this state of the art, we conclude that high-quality, 
multi-party mobile video conferencing is still a pending goal 
to achieve. We summarize the key challenges as follows: (1) 
The workload on each node in a video conferencing session, in 
terms of both processing and transmission, scales quadratically 
to the size of the session, which makes it challenging to 
use mobile devices for multi-party video conferencing. (2) 
Mobile users are equipped with different devices and downlink 
speeds; a high-quality solution should enable differentiated 
call qualities to different users, instead of a homogeneous 
video broadcast quality enforced by the low-end users, as in 
a traditional solution. 
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In this paper, we present vSkyConf, a cloud-assisted mobile 
video conferencing system, to fundamentally enable high- 
quality, multi-party video conferencing over heterogeneous 
mobile devices. The cloud computing paradigm offers ubiq- 
uitously accessible computing resources, with on-demand re- 
source provisioning at modest cost. The paradigm particularly 
compensates well for the inherent resource deficiencies of 
mobile devices, and catalyzes the undergoing evolution in 
the burgeoning mobile computing industry. In vSkyConf, we 
dynamically provision a virtual machine in the cloud as the 
exclusive surrogate for a dialed-in mobile user Each mobile 
device uploads its stream to its surrogate and downloads 
others' streams from the surrogate; the surrogates exchange 
conferencing streams among each other, transcode the streams 
to the most appropriate bit rates, and buffer the streams for the 
most efficient delivery to the mobile recipients. By leveraging 
the more powerful processing capabilities and stable wired 
network bandwidths, mobile users shift those otherwise on- 
device tasks to the cloud, yielding superior power reduction 
and quality enhancement, as well as achieving fully scalable 
communication among the conference participants. 

To realize such a design, several key questions remain to be 
answered: (1) How should we map the quadratically increasing 
session flows to the links between surrogates, to achieve 
satisfactory streaming experience with latency guarantees? (2) 
To which surrogates should the necessary transcoding tasks 
be assigned, considering different computing capacities of the 
surrogates? (3) In dynamic networking environments where 
jitters happen frequently, how should a surrogate help to 
smooth out jitters for mobile devices? 

To address these issue, a fully decentralized, optimal algo- 
rithm is designed to decide the best paths of streams and the 
most suitable surrogate for video transcoding along the paths, 
such that bandwidth capacities in the system are fully utilized 
to deliver streams of the highest possible quality to the mobile 
recipients. We also carefully tailor a buffering mechanism on 
each surrogate to cooperate with optimal stream distribution. 
Together they guarantee bounded, small end-to-end latencies 
and smooth stream playback at the mobile devices, in the 
video conferencing sessions. We have implemented vSkyConf 
based on Amazon EC2. Experiments in the real-world settings 
reveal the high scalability, full adaptability, and excellent video 
conferencing qualities achieved by our design, as compared to 
the widely adopted unicast solutions. 

The remainder of this paper is organized as follows. We con- 
duct a thorough literature survey in Sec. HI] Unique challenges 
and the system architecture are presented in Sec. |III] Design 
details unfold in Sec. IIVI followed by Sec. [V] introducing the 
deployed prototype as well as real-world evaluations. Finally, 
Sec. IVII concludes the paper 

II. Related Work 

Despite extensive studies during the past decades, video 
conferencing (VC) has recaptured people's interest in this new 
"Smartphone" era, with a series of works and systems spring- 
ing up recently ll4lll6lir7lll8lll9l lfT0l . which can be categorized 



into Server-to-Client (S/C) based and Peer-to-Peer (P2P) based 
solutions. 

The network-layer solution to naturally support VC is 
still IP multicast jllj. However, its weakness of scalability, 
difficulties of deployment and security issues still prohibit it 
from being a practical choice. 

Cloud computing, as a natural agile solution, compensates 
well for the deficiencies of mobile devices for media stream- 
ing, in terms of both processing and bandwidth supports. Tra- 
ditional players ll4l lfT2l in the VC marketplace have recently 
claimed their support to mobile users of different platforms 
via their private clouds. WebEx Iil2j builds up their services 
using Cisco clouds. Vidyo H even advertises the slogan 
"Conferencing-as-a-Service", and offers a complete solution 
by provisioning virtual MCUs on top of their VidyoRouters 
111, bearing similar flavors to their traditional dedicated in- 
frastructures. In contrast to such centralized solutions for 
enterprise users, our work novelly provisions a VM surrogate 
for each ordinary mobile user in an laaS cloud, in a more 
affordable manner 

Another series of work try to exploit scalable video coding 
(SVC) to enable differentiated services to users with different 
available bandwidths. Huang et al. |[9l leverage clouds to 
encode videos into layered rates, but the encoding complexities 
inevitably incur intolerable delays for a time-sensitive applica- 
tion like video conferencing. Besides, the output bit rates for 
SVC encoders are restricted within a range, not flexible under 
a much more dynamic network condition. 

Compared to the S/C model, P2P is deemed as a more 
promising structure. Both Ponec et al. ||6l and Chen et al. Q 
formulate utility maximization problems and enable multi- 
party VC by building multi-rate multicast trees. They focus 
more on the streaming rate allocation over physical links, but 
do not investigate much the transcoding flexibilities. Liang 
et al. ITOl leverage the upload capacities of "helpers" from 
other swarms, in similar ways as adopted by Skype (not 
Skype mobile). Though promising, it is difficult to achieve 
in cases of mobile users who are reluctant to contribute 
resources to strangers, due to constrained batteries and ex- 
pensive cellular data fees. De Cicco et al. fS) conduct solid 
measurements evaluating Skype video-rate adaptabilities to 
bandwidth variations and reveal only 450 kbps can be achieved 
even under a good network condition. Those parameters can 
act as good references in our evaluation of the vSkyConf 
prototype implementation. 

The dominant solution in most existing P2P-based mobile 
VC applications is still pair- wise unicast, e.g., Fring |]5|, 
Tango lO, etc., due to simplicity of implementation. However, 
the limited uplink bandwidths of mobile devices lead to a 
constrained swarm size. Our real-life experiments in Sec. 
also reveal its susceptibility to network jitters especially for 
long-haul sessions, compared with vSkyConf. 

A recent work by Feng et al. ifTSi leverages inter-datacenter 
networks to maximize the overall throughput of all conferenc- 
ing sessions, based on intra-session network coding. vSkyConf 
considers both dynamic session routing and adaptive session 
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Fig. 1. The architecture of vSkyConf. 

transcoding, and advocates to exploit a cloud infrastracture for 
mobile video conferencing. Little effort has been devoted to 
building a cloud-assisted multi-party mobile video conferenc- 
ing system, catering for the needs of ordinary mobile users in 
their daily life. vSkyconf is designed with this goal in mind. 
The framework and associated protocol suite can also apply 
to other delay-sensitive, rate-differentiated, multi-party mobile 
video streaming applications. 

III. Architecture and Design Objectives 

In this section, we highlight the key components and design 
principles of the cloud-assisted video conferencing system, 
vSkyConf, with detailed designs unveiled in Sec. II VI 

A. Architecture and Key Modules 

vSkyConf enables efficient, peer-to-peer fashioned, multi- 
party mobile video conferencing via an laaS cloud, with 
the architecture presented in Fig. [T] . We refer to a video 
conference call among multiple mobile users as a session. 
The user which starts the conference call is the initiator of 
the session. Each user in a session produces a video stream, 
via the camera on its mobile device, sends the stream to other 
users, as well as receives streams produced by all the other 
users. 

A surrogate, i.e., a virtual machine (VM) instance, is created 
in the laaS cloud for each mobile user The laaS cloud consists 
of disparate data centers in different geographic locations, and 
the surrogate for each mobile user is assigned on a data center 
proximate to the user As a proxy for the mobile device, a 
mobile user's surrogate is responsible for the following: (i) 
session maintenance, by exchanging control messages with 
other surrogates in a timely and efficient manner; (ii) video 
dissemination and transcoding, by receiving the video stream 
its mobile user produces, transcoding it into appropriate for- 
mat(s), distributing it to its own and other users' surrogates, 
and the other way round as well; (ii) efficient video buffering 
for its mobile user, for timely, smooth and robust streaming 
to the corresponding device. A mobile user just needs to send 
the stream it generates and receive streams others produce to 
and from its surrogate, and is effectively freed from power- 
consuming processing and communication. A gateway server 
in vS'^Con/loosely keeps track of participating users and their 
surrogates, which can be implemented by a standalone server 
or VMs in the laaS cloud. 
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Fig. 2. The key modules of a surrogate. 

The key modules implemented on a single surrogate is 
depicted in Fig. |2l which can be divided into two parts: the 
control plane and the data plane. 

Control Plane is the brain of the surrogate, responsible 
for control signaling between this surrogate and neighbor- 
ing surrogates. It measures the latencies and bandwidths on 
the connections from/to neighboring surrogates, and all the 
collected information is stored in the "peer table", which 
constructs a partial view of the video conferencing topology 
from this surrogate's point of view. Utilizing the collected 
information, the surrogate computes routing paths for streams 
from its corresponding mobile user to other mobile users, and 
participates in the construction of optimal video dissemination 
trees. It also monitors the call qualities and determines the best 
video encoding parameters (codecs, bitrates, etc.) for streams 
from/to its mobile device. 

Data Plane is responsible for processing in/out video 
streams, in terms of both transcoding and forwarding, as 
directed by the control plane. The video stream from its 
mobile user is captured continuously and disseminated to other 
surrogates after necessary transcoding. In the reverse direction, 
all video streams from other mobile users, via their respective 
surrogates, are transcoded into appropriate rates (if necessary) 
and delivered to the mobile user by a key module "jitter 
mask", which deals with random jitters caused by fluctuations 
of processing and network latencies, as well as any anomalies 
along the dissemination paths. 

B. Design Objectives 

Our design of vSkyConf observes the following principles. 

Decentralized Control. Except necessary bootstrapping 
from the gateway server, vSkyConf aims to rely as little as 
possible on the central control, for session maintenance and 
route computation. Each session is to be maintained by the 
surrogate of the initiator of a conference session, in order to 
provide good scalability and flexibility. The video routing and 
transcoding decisions are to be made in a fully distributed 
fashion by collaborations among surrogates. Considering the 
mobile users can join and leave the system dynamically, fail- 
over mechanisms are also smartly integrated, to guarantee 
robustness of each session. 

Self-Evolving Routing Topology with Full Adaptivity. 
A best routing topology for disseminating the stream from 
each participant should be built among the surrogates in a 
conference session, which achieves a small end-to-end latency 
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to each of the other users and fully exploits the available 
bandwidths among the surrogates. Transcoding decisions to 
convert the original stream to acceptable formats/bit rates 
of the recipients should be made at the best point along 
the dissemination paths, according to different computation 
capacities of the surrogates and needs of downstream mobile 
devices. The routing paths and transcoding points should be 
dynamically evolving, according to the current bandwidth 
and latency among surrogates and wireless connectivity to 
the mobile users. vSkyConf proposes a dynamic routing and 
transcoding algorithm to achieve these objectives. 

Synchronized Playback. One primary knotty issue in a 
multi-party video call is to keep all the streams played 
synchronously, without any noticeable lagging or leading 
streams. A straightforward approach to offsetting skewness 
among multiple streams as also applied by vSkyConf is to 
impose a latency, forcing the leading streams to wait for 
the lagging ones. That, however, leads to another associated 
synchronization problem in a different dimension, due to 
different "wall clocks" at different mobile users. Traditional 
solutions seek help from Network Time Protocol (NTP) 11141 
servers to adjust each user's system time. In contrast, we 
seek to design a different simple measure to allow all the 
participatory surrogates to calibrate skewness of their own 
clocks against that of the session initiator. 

Robust, Smooth Video Streaming. In a practical dis- 
tributed system, various random events may happen. For 
instance, upstream surrogates in a dissemination path may 
suddenly drop offline or may over-claim their link bandwidths 
and latencies due to measurement errors, etc. To guarantee 
smooth stream playback at each mobile user even in cases of 
inaccurate route computation, vS'^Con/ designs an advanced 
error correction mechanism to search for better call routing 
paths before the call quality drops, by monitoring a carefully 
designed jitter buffer with pre-configured thresholds. 

IV. Detailed Design 

We present detailed design of vSkyConf, to achieve the 
design principles presented in Sec. UlI] 

A. Session Maintenance 

Establishment: When a mobile user logs in to the vSkyConf 
system via the gateway server, it is assigned a surrogate VM. 
The gateway can maintain information on a pool of available, 
pre-initiated VMs in the laaS cloud, and assign one from 
the pool to a mobile user based on geographic proximity of 
the two, to expedite the service. The surrogate of the session 
initiator finds out IP addresses of surrogates of the other online 
users from the gateway server, which it wishes to invite to join 
a video conferencing session. The initiator then relies no more 
on the gateway server: it contacts and invites the interested 
participants through their surrogates directly, and maintains 
a list of IP addresses of all active surrogates in the session. 
Each participant sends periodical "heartbeat" messages to the 
session initiator, and receives the time-stamped "ack" from the 
initiator which is used to calibrate the local "clock" skewness 



against the initiator's, as shown in Fig. [3] The updated lists of 
IP addresses are periodically broadcast to all active participant 
surrogates from the initiator as well. In this way, the load on 
the gateway server is significantly alleviated by initiators of 
different conferencing sessions, and one gateway server can 
support many concurrent conferencing sessions in the system. 

Tear-down: When a mobile user leaves the system, its 
surrogate VM is released and returned to the pool of available 
VMs in the laaS cloud. If the initiator of a session departs, its 
hosting role is handed over to another substitute surrogate in 
the participant list. 
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Fig. 3. Clock synchi'onization among different surrogates. 

B. Routing Computation 

In a video conferencing session with S users, there are S 
streams, each produced by one of the mobile users, to be 
delivered to all the other users. For example, there are four 
streams in Fig. [T] where the stream produced by one user 
is to be distributed to all the other three. It is important to 
construct an efficient dissemination topology for each of the 
streams, to maximize the receiving rates while guaranteeing 
small end-to-end latencies. In addition, different participants 
may require different video formats/bitrates, leading to the 
following challenges: what is the best format/bitrate the source 
mobile user should send its stream at, considering needs of the 
receivers and bandwidth availability both among the surrogates 
and at the last-mile wireless links? If transcoding is necessary, 
at which surrogate(s) should transcoding take place along the 
dissemination paths, such that one transcoded stream can be 
useful for multiple downstream users? 

We next model a mathematical optimization problem for 
constructing efficient dissemination topologies of all streams 
in a session and deciding the optimal transcoding locations. 
We then design efficient, fully distributed heuristic to approach 
the optimal solution in a dynamic system. For transcoding, 
we practically only consider down-sampling of a stream, i.e., 
the reduction of streaming bit rate, but not the reverse, since 
up-sampling provides no quality improvement but consumes 
unnecessary bandwidth. We also focus on transcoding due to 
mismatched bit rates of streams of the same format, while the 
case of transcoding from one format to another can be readily 
addressed with similar efforts. 

1) Optimization Formulation: Let graph G ~ (S,£) repre- 
sent the network of surrogates in a session, where S is the 
set of surrogates and £ is the set of directed connections 
among the surrogates. For each surrogate m G |iS|, let rh 
represent the corresponding mobile user Let S ~ \S\. Suppose 
Cij is the maximum available bandwidth on link E £, 
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and dij denotes the link latency. We refer to the stream 
from a surrogate m € S as flow m, with source rate 
which is the rate of incoming stream from mobile user rh 
to surrogate m, determined by the source capturing rate by 
the user's mobile camera and the uplink rate from the mobile 
user Let i?^"' be the maximum acceptable bit rate of flow 
TO at mobile user h, as decided by the last-mile down-link 
bandwidth from surrogate n to h, and the allocation of this 
down-link bandwidth among streams from different users, e.g., 
if user n sizes playback windows of streams from S —I other 
conference participants equally on its device screen, of 
the down-link bandwidth should be allocated to each stream. 

As a known modeling technique, the multicast flow m 
from surrogate to to all other surrogates can be viewed as 
consisting of S* — 1 conceptual unicast flows [i5|, from to to 
each of the other surrogates, respectively. These conceptual 
unicast flows co-exist in the network without contending for 
link bandwidths, and the multicast flow rate on a link is the 
maximum of the rates of all the unicast flows going along this 
link. For ease of practical implementation, we restrict each 
unicast flow from to to n to be an integral flow along one path 
(but not allow fractional unicast flows along different paths), 
with the end-to-end rate and the multicast topology is the 
overlap of all the 5—1 unicast flow paths. Let binary variable 
/j™" indicate whether the conceptual unicast flow from to to 

n traverses link G £, and cl™"* denote the actual rate of 
the multicast flow m on link 

Let function (pn{ri,r2) give the transcoding latency at 
surrogate n, if the rate ri of an ingress flow received by 
n is higher than the rate r2 of the egress flow from n. 
'Pn{Ti,T2) = if ri < r2- Typical transcoding steps are 
to decode the source stream of rate ri to an intermediate 
format, and then re-encode the stream from the intermedia 
format to the destination rate r2 |[T6|. Hence, transcoding delay 
Vn(?'i, r2) is monotonously increasing on both ri and r2, and 
depends on computation capacity of the surrogate VM n: the 
more powerful the VM is, the faster the transcoding can be 
accomplished. 

The quality of service in the conferencing session relies 
on two aspects: (i) the end-to-end latency and (ii) the flow 
rate received by each participant for each flow. We bound the 
end-to-end latency, from the time a source surrogate m emits 
flow TO to the time a receiver surrogate n is ready to push 
the stream to its corresponding mobile user, by lI™'', whose 
value is dynamically set as in Sec. IIV-CI Let ) be 

an increasing, concave utility function on the rate of flow to 
received by surrogate n, r^n^\ We maximize the aggregate 
utility of all receivers in all flows as our objective. The 
optimization problem is formulated in ([T). 

— E E uC^) (1) 

subject to: 



lij - 2^ I]k =0j ,Vj,7n,n e S,7n n, (2) 

i:(i,j)eS k:{j,k)eS 

/^ri""' < c^j"-' , V(i, j) € £,m,n € S,m. n, (3) 

1 ( rmn (m) p{™.) \ ^ r (m) 

j:(j,n)S£ 

Vm, n £ S,m ^ n, (5) 
A™ 6 {0> l},Vm,n e 5,m / n, £ £, (6) 
0<ri'"> < J?^"^',Vm,ne5, (7) 
< ri™> < R'^^ .ym,n£S, (8) 

where 




-1, J = m 
1, j = n 
0, otherwise 



Constraints (|2|l and ^ enforce a single path for the unicast 
flow from surrogate to to n, and ensures flow conserva- 
tion along the path. Constraint (O implies that the unicast 
flow from TO to n with rate ri™"* is conceptual, "hidden" 
in the actual multicast flow m with rate c,-'"'', on each 
link Constraint (|4]i requires that the overall rate of 

actual flows from different sources should not exceed the 
capacity of each link. Constraint (|5]l bounds the end-to-end 
delay along the path from source surrogate m to receiver 
surrogate n, which consists of three parts: (i) the overall 
link delay along the path, j)e£ -^iT^'^^i'' '^"^ ^^'^ 
potential transcoding delay at intermediate surrogates j's along 

the path, E.:0;fc)e£ ^""^^^^ (4"^ 4"^)' ^^ere a 

surrogate j is on the path if there exist neighboring links (i, j) 
and {j, k), such that /™" = 1 and /j^" = 1, and a transcoding 

delay occurs if the flow rate on Ci™\ is larger than the 

flow rate on (j, fc), cj™''; (iii) the potential transcoding delay 

at surrogate n, <<5»i(Z]j:(jv«)e£ -^i™'')' transcode 

the received stream to the maximum receiving rate allowed at 
mobile user n, if needed. Constraints (|7]i and ^ restrict the 
end-to-end rate of virtual unicast flow from surrogate m to n 
to be no larger than the maximum sending rate from mobile 
user m and the maximum receiving rate at mobile user n. 

The solutions to the optimization problem, J'i™'*, cl™-**, 
Z™"*, ym,n G S,n ^ m,{i,i) G £, give us (i) the rate 
at which each mobile user to should send its stream to its 
surrogate to, which is the maximum of all conceptual unicast 
flow rates from m to the other surrogates, max„g5.„^m ri™'; 
(ii) the delivery rate of flow m along each link (i, j) and hence 
the flow routing topology among the surrogates (c|™^* = 
indicates flow to is not to be routed over link {i,j)y, and (iii) 
where the transcoding of each flow m should happen, i.e., a 
surrogate j where an egress flow rate c^™^ is smaller than 
the ingress rate c|™' along the same conceptual unicast path. 
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Algorithm 1 Flow Routing and Rate Allocation 

1: Construct shortest-path trees from each surrogate m, 

2: if Elm, n e >Ljr' then 

3: No feasible solution exists; return ; 

4: end if 

5: Nij :— Number of dissemination trees on 

6: V(a,&) eT("),ci",' ■.= m^n^^s,i^,J)eTi^,{R^['\^^}■, 

7: Search for better routing paths, following Alg. |2] 



should transcode flow m to the lower rate. 

2) Distributed Heuristic: The optimization problem ^ 
is non-convex with integer variables, thus very difficult to 
solve for the exact solutions. We design an efficient heuristic 
algorithm, as given in Alg.[T]and Alg.|2] to decide flow routing, 
rate assignment, and transcoding in a fully distributed fashion. 

We first decide a basic, feasible dissemination topology for 
each flow m, on which the end-to-end delay constraint for each 
receiver, constraint (|5), is satisfied. Though the optimization 
problem ^ does not restrict the topologies into trees, we seek 
to constraint a dissemination tree for each flow for ease of 
practical implementation. For conciseness, wi™^ represents the 
overall latency (including both link and necessary transcoding 
latencies) for flow rn from surrogate m to surrogate n. A 
shortest-path tree is constructed from surrogate m to all the 
other surrogates, using a distributed Bellman-ford algorithm 
iflTl (Line 1 in Alg. [T]). If the overall link latency on the 
path from surrogates to to n is larger than lI"''', we know 
that this pre-set end-to-end latency bound is by no means 
satisfiable, and should be adjusted to a more reasonable value 
(Lines 2-4). We then decide a basic, end-to-end rate of flow 
TO on this shortest path tree, from surrogate m to all the other 
surrogates: the capacity Cy of each link (i, j) is evenly divided 
by the (actual) flows generated by different surrogates, that 
pass through this link; the end-to-end rate of each flow m is 
set to the rate allocated to this flow on the bottleneck link its 
shortest-path tree spans (Lines 5-6). 

Based on the basic dissemination topology, each surrogate 
then carries out dynamic edge and rate adjustments, in order 
to maximally utilize the available capacity to stream high- 
quality streams, without violating the latency constraints. For 
each flow m, suppose surrogate j is the parent to surrogate 
n on the current dissemination tree of flow m. n contacts 
other neighboring surrogates in the flow, to discover if there 
is a better path from source surrogate m with higher capac- 
ity via another parent k. It compares the current receiving 
rate cj™'' from j with the potential receiving rate from k, 

min(c^™\ Cfcn), where we suppose surrogate i is the parent 
of k in the current tree, and Ckn is the remaining available 
bandwidth on link {k, n) (Line 2). If the potential receiving 
rate via k is larger, n needs to further evaluate the increased 
latency along the new path, due to changes of link latencies 
and potential transcoding latencies at k and n. Only if the 
latency of the new path from to to n, i.e., wi™', is still within 



Algorithm 2 Self-Evolving Route/Rate Adjustment at Surro- 
gate n in Flow m 

1: while 3{j,n) e T^™), c^.™^ < do 

2: if 3{i, k) G T(™), min{d^^, Ckn} > cj™^ then 

3: A:=MU{g: GT"}; 

4: if Vp e A, 4™^ < ip™^ then 

5: T(") :=T(") - (j» + (fc,n); 

6: end if 

7: end if 

8: end while 



Ln , and the updated latency to each of the descent surrogates 
from n on the tree is still within the respective delay bound, 
can ?? safely change its parent from j to k (Lines 3-6). 

We illustrate the algorithm using a simple example in Fig.|4] 
There are three shortest-path trees reaching c, emitting from 
a, b and d, respectively (shown in Fig. |4](l)-(3)). Fig. |4](4) 
shows the three flows reaching c altogether. Suppose the only 
bandwidth bottleneck lies in link (a, c) with a capacity of 512 
kbps, and all other links have a capacity of 1024 kbps. The 
basic rate for flow a and flow h received by c is 256 kbps, 
respectively. Then c finds a better path for flow h via d with 
a higher available bandwidth of 512 kbps, and c relocates 
the routing path for flow h after it assures that the latency 
constraints are not compromised. 




(3) (4) 



Fig. 4. A simple example to illustrate the self-evolving flow routing and rate 
allocation algorithm. 

The algorithm can be carried out in a completely decentral- 
ized fashion. Each surrogate dynamically measures the Unk 
conditions (bandwidth, delay) to its neighbouring surrogates. 
In our prototype implementation, latencies are measured using 
"ping" messages; bandwidth availabiUty is estimated based on 
past stream transmission experience; routing path information 
is spread via messages exchanged between the neighboring 
surrogates. It is worth noting that the adjustments at surrogates 
are carried out for different flows asynchronously. Such "ran- 
domness" allows bandwidth on a link to be allocated among 
different flows, rather than occupied by a few. We have care- 
fully studied the correctness of our distributed heuristic, with 
theorems to show that the routing paths incur no cycles and 
are always feasible (guaranteeing end-to-end delay bounds). 
Please see the appendix for details. 
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Fig. 5. An illustration of the end-to-end delay for flow m in vSkyConf. 

C. Jitter Masking 

In multi-party video conferencing, a user receives multiple 
streams from different senders. Synchronization among differ- 
ent streams received at all users is crucial to users' perceived 
quality of experience. It is much desired that the video frames 
captured at all users at the same time, are played at all the 
recipient user devices at the same time. We design an effective 
buffering mechanism at the surrogates, which collaborates with 
the routing algorithms, for this purpose. 

Surrogate n maintains a buffer si™'' for each stream 
TO S S / {n} from each of the other surrogates. The buffer 
holds video packets of flow to, ready to be delivered to mobile 
device n. vSkypeConf enforces an end-to-end delay of D, 
from when a video frame is captured at one mobile device, 
to the time it is synchronously played at all the other mobile 
devicesQLet indicate delay between mobile device m and 
its surrogate m, Vto £ S. For a frame in buffer bIi^\ which is 
produced at t at the source in, it will be pushed out from the 
buffer no earlier than f+^l,™', where Cn^^ = £>— Aj„— A„, in 
order to guarantee playback of the frame at the mobile device 
fi ?Xt + D (Fig. Is). Note that we seek to simply the buffering 
mechanism at the mobile devices in vSkyConf, while leaving 
the main buffering tasks to the surrogates. 

If there were no jitter in the cloud, we could set the delay 
bound in™' in optimization used to find the routing path 
from surrogate m to surrogate n, to il™'' = /^n™"*, and 
rest assured that the buffer will never starve. However, in 
a practical system, jitter may occur due to various reasons, 
e.g., variation of transcoding delay at surrogates, inaccurate 
estimate of link delay and bandwidth when running our routing 
algorithm, etc. Hence, ii*"^ in the optimization for route 
selection should be set smaller than in order to absorb 

the inaccuracy and jitter 

A series of solid measurement work llTSi have shown that 
jitter on a network path approximately follows a normal 
distribution |19il . Let Jn™'' be a random variable, representing 
the path delay from surrogate to to surrogate n, such that 
J^") - N{^i, (7% where ^ is the mean and a is the standard 
deviation. For a normal distribution, we can derive that 99.97% 
of the samples fall within the range of (— oo, /i + 3.4f7). If we 
set Ln^^ to the mean n in the path delay distribution while 
allowing £1™'' = /i + 3.4(t, we derive L„ ' = — 3.4ct. 

' The value of D can be set based on reasonable estimation of the maximum 
delay between two mobile users in the system, and should fall in the acceptable 
delay range for real-time communication. 



Using this Ln in solving optimization ([T), we can make sure 
that 99.97% of the video packets, following the path selected, 
can be sent out from surrogate n by d"n^\ and catch their 
playback deadlines at the mobile device to. 

In vSkyConf, each surrogate n dynamically estimates the 
delay variance a along the path from to to 7i, based on inter- 
packet latencies of flow to it receives. It also observes the 
current queueing delay in buffer bI^\ and adjusts L^™'' used 
in path selection according to L^'"'' = — 3.4cr. That 

is, if there are less packets in the buffer caused by larger 
delay variance, it tunes Li™"* down to be more stringent on 
the latency requirement in the path selection; otherwise, it 
tunes up to explore paths with better bandwidths. In this 
way, this buffering mechanism at the surrogates collaborates 
with the routing algorithm, to deal with randomness in the 
system and inaccuracy in the computation, while maximally 
guaranteeing synchronized playback of all streams at all the 
mobile users. 

V. Performance Evaluation 
A. Prototype Implementation and Deployment 

We implement a prototype of vSkyConf and deploy it in 
Amazon Elastic Compute Cloud (EC2), for multi-player video 
conferencing among users from various geographic locations. 
To generate reproducible experiment results, each mobile user 
is emulated by a machine near its assigned EC2 region 
(within 50 ms distance), where video frames are generated 
at a constant rate around 1049 kbps (25 fps) from a video 
captured by an iSight webcam. Surrogates are provisioned 
from "ap-southeast-la" region (Singapore) for Hong Kong 
users, "eu-west-la" (Ireland) for European users, "us-west- 
Ib" (California) and "us-east-la" (Virginia) for users in west 
US and east US, respectively. To showcase the adaptabil- 
ity and self-healing capability of our system against abrupt 
network fluctuations, we emulate dynamic environments by 
manually injecting jitters on the links between surrogates via 
Dummynet 1201 . We implement an application-layer packet 
controller to limit the link capacities between surrogates, into 
the range of [128,1050] Kbps. Both uplink and downlink 
bandwidths of each emulated mobile user are within the range 
of [1.5,2] Mbps, the same as those on regular 3G cellular 
connections. We apply the concave function log(a;) as the 
utility function in our routing computation. The latencies 
between surrogates are the actual delays between Amazon 
EC2 instances. The transcoding latencies, are pre-evaluated 
on the VM instances and used in our routing computation, 
for transcoding from 768kbps to 256kbps, from 768kbps to 
128kbps, from 256kbps to 128kbps, respectively, which are 
all the cases for transcoding under our setup. On each of 
our emulated mobile clients, the stream from one of the 
other conference participants is displayed in a large screen 
(corresponding to a maximal acceptable streaming rate of 
768 kbps), and streams from other participants are displayed 
using smaller screens (corresponding to maximal acceptable 
streaming rates of 128 kbps or 256 kbps). Besides, a fixed 
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400 ms end-to-end delay (D in Sec. IIV-CI ) is configured, and 
the buffer for each flow at each surrogate is set to a size 
corresponding to 400ms stream playback. 

A light-weighted stream transmission protocol among sur- 
rogates is implemented based on UDP. The packet header has 
13-octet mandatory part with 3 octets for future extensions, 
as shown in Fig. |6] The "TimeStamp" (4 bytes) represents the 
moment when the packet is generated. Before attaching the 
current time to a packet, the surrogate should add in the first- 
mile latency between the mobile user and itself. The "Flow 
ID" field (4 bytes) indicates who generates the packet, by 
including the IP address of the source surrogate. "Rate" (2 
bytes) represents the bit rate (kbps) of the stream encapsulated 
in the packet, "FR" (1 byte) stands for the frame rates (fps) 
and "Seq" (1 byte) is the sequence number of the packet in 
the flow. There are a total number of [ frxp^ — 1 packets for 
each video frame, where Pmax represents the maximal payload 
length of a vS'^Con/ packet, which is chosen to be 512 bytes 
in our implementation. A frame is lost if any packet of this 
frame is lost. "Codec" field (1 byte) indicates the codec of the 
stream for transcoding reference at the surrogates. 
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Header 



IP 

Header 



TimeStamp 



Flow ID 



Rate 



Codec 



FR 



Seq 
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<- 32 bits 

vSkyConf Header 

T 



UDP 
Header 



vSkyConf Packet Payloads 



Fig. 6. vSkyConf packet header. 

B. Adaptive Flow Rates at vSkyConf Clients 

We test a video conferencing session among 10 participants: 
5 from Hong Kong, 1 from Europe, 2 from US West and 2 
from US East, respectively. Random jitters up to 150 ms are 
imposed on the links between the surrogates in Europe and in 
Hong Kong. 

As a potential bottleneck for scalability, the surrogate for 
the session initiator is responsible to maintain the session 
by handling the "heartbeat" messages and periodically broad- 
casting the user lists to the other surrogates in the session. 
We therefore investigate the conferencing performance at the 
initiator's surrogate: if its performance is satisfactory, then the 
performance at the other surrogates should be even better 
Fig. |7] illustrates the flow rates for streams from 3 among 
the other 9 conference participants (since plotting 9 curves 
in a figure would make it less readable). "Flow-b" is the flow 
from the European user, configured to be displayed at the main 
large screen at the initiator mobile user (corresponding to a 
maximum streaming rate of 768kbps); "flow-a" and "flow- 
c" are to be displayed at smaller screens at the initiator 
(corresponding to a maximum streaming rate of 128kbps and 
256kbps, respectively), coming from Hong Kong and US west, 
respectively, with the latter joining the session at a later time. 



We can see that both "flow-a" and "flow-b" go through a "fast" 
start stage, when the basic stream dissemination topology is 
being constructed (as introduced in Sec. lIV-Bl . and then evolve 
towards their maximal acceptable rates, "flow-a" achieves its 
maximal rate quickly, while "flow-b" sticks to the link between 
"eu-west-la" and "ap-southeast-la" before redirected to a 
better routing path, as affected by the injected jitters along 
the link. After "flow-c" joins the session around 47 seconds, 
the flow rate of "flow-a" drops a bit before "flow-c" adjusts 
its routing path shortly. This is caused by the link contention 
between the new "flow-c" and the existing "flow-a". 

Fig. [S] presents the load in the jitter buffer for "flow-b" at 
the initiator's surrogate, where we see that the buffering level 
varies significantly when "flow-b" takes a path going through 
the link between "eu-west-la" and "ap-southeast-la", due to 
jitters long the link. This causes L^™' to be tuned down and 
hence the end-to-end latency constraint in optimization ([T]) 
for "flow-b" is violated. The algorithm then redirects "flow-b" 
through a better path via the "us-west-lb" region, which leads 
to a more stable buffering level later on. 

Fig. |9] shows the corresponding latency of each flow, from 
the corresponding source surrogate to the initiator's surrogate. 
We observe that latencies only vary slightly whenever the 
routing paths are adjusted, and can well meet the end-to- 
end latency required (400ms). The latency of "flow-b" varies 
more significantly, due to the manually imposed jitters on the 
link between "eu-west-la" and "ap-southeast-la", before the 
algorithm redirects the flow to a better path. 

All the above results show that the streaming rates are 
promptly adaptive to the network conditions among the surro- 
gates, and the stream playback at the initiator is quite smooth 
with stable end-to-end delays up to the requirements. 
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Fig. 7. Flow rates at the initiator's surrogate. 
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Load of flow-b's buffer at the initiator's surrogate. 



C. Performance Comparison with a Unicast Solution 

We next evaluate the performance of vSkyConf against a 
unicast scheme typically applied by peer-to-peer video confer- 
encing solutions, where each flow is directly transmitted from 
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Fig. 9. Flow latencies at the initiator's surrogate. 



the source to the destination via the cellular network (which is 
via the Internet in our emulated experiments). To conduct a fair 
comparison, we establish a 3-user video conferencing session, 
since the uplink bandwidth limits the conference size in a 
unicast scheme. We emulate a 50-minute long conferencing 
session with one user coming from each of the regions. Hong 
Kong, Europe, and west US, respectively. Other experimental 
settings are the same as used in Sec. IV-BI except that no 
emulated jitters are imposed on the link from "eu-west-la" 
to "ap-southeast-la". Fig. [TOl shows the perceived end-to-end 
latencies of the two flows received at the user in Hong Kong 
when it employs the unicast solution or vSkyConf, where "eu" 
stands for Europe and "usw" stands for west US. We can 
see that the end-to-end latency achieved with vSkyConf is 
generally smaller, and much more stable than that achieved 
by the unicast solution. 

Fig. [m compares the streaming smoothness between the 
two solutions, by evaluating the amount of time-out delay 
incurred during the streaming of each flow, due to late packets 
received after their playback deadlines. The x value indicates 
the occurrence time of packet time-out and the y value indi- 
cates the packet delay beyond the respective deadline. Again, 
much fewer packets arrive after their playback deadlines in 
vSkyConf, verifying the smooth stream playback experienced 
by v5AyCo«/ users. This shows that our cloud-assisted design 
is very suitable to achieve high-quality video conferencing 
among multiple mobile participants. 
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Fig. 10. End-to-end latency experienced at the Hong Kong user 

VI. Conclusion and Future Work 

This paper presents vSkyConf, a cloud-assisted mobile video 
conferencing system, designed to fundamentally improve the 
quality and scale of multi -party mobile video conferencing. 
In vSkyConf, a virtual machine in a cloud infrastructure is 
employed as the proxy for each mobile user, to send and to 
receive conferencing streams, and to transcode the streams into 
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Fig. 1 1 . Packet time-out delay at the Hong Kong user 

proper formats/rates. We design a fully decentraUzed, efficient 
algorithm to decide the best paths of stream dissemination 
and the most suitable surrogates for video transcoding along 
the paths, and tailor a buffering mechanism on each surrogate 
to cooperate with optimal stream distribution. These designs 
guarantee bounded, small end-to-end latencies and smooth 
stream playback at the mobile devices. We have implemented 
vSkyConf based on Amazon EC2 and verified the excellent 
performance of our design, as compared to the widely adopted 
unicast solutions. As future work, we seek to test vSkyConf 
under more dynamic settings. 

Appendix 

Theorem 1: No cycles exist in the adjusted dissemination 
tree. 

Proof: The initial feasible solution contains no cycles, 
since it is a shortest-path tree T^™'. Suppose the cycle 
happens when a surrogate i adjust its upstream surrogate from 
i' to fc, i.e., ■■■ ,i',i,j,--- The down-sampling 

transcoding mechanisms applied in our algorithm guarantees 
that c,-™'' > c,-™'' > ■ • • > c^,',']^''. According to Alg. |2] i can 



only adjust its upstream surrogate to k when c^™^ > c, 
which contradicts. ■ 
Alg. [2] sketches the core logic, independent of any specific 
implementations. Our prototype, i.e., vSkyconf, enables gossip- 
like message exchanges between neighbouring surrogates to 
facilitate the construction of dissemination trees. 

Each surrogate m maintains two key tables, i.e.. Candidate 
Upstream Surrogate Table (CUSTabm) and Downstream Sur- 
rogate Table {DSTab„i), respectively. CUSTab keeps track of 
the possible paths for each flow, from which the surrogate can 
choose one immediately once routing adjustment is needed. 
DSTab keeps track of all the down stream surrogates for 
each flow to which the surrogate should relay after neces- 
sary transcoding. Based on constantly updated DSTab, each 
surrogate is associated with a metric pair for each flow, i.e., 
< requested — rate, maximal — delay >. requested — rate 
represents the rate the surrogate should request from the 
upstream surrogate, while maximal — delay represents the 
maximal delay when choosing a path and can help filter out 
those unqualified upstream surrogates. Let a-™'' represent the 
requested rate for flow m at surrogate i, and /Sj^'"' represent 
the maximal delay for flow m when surrogate i chooses a 
path. Both of them can be defined recursively as Eqn. |9] and 
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min{Rl'"\C,,i} (DSTab, = (l)) 
min{max.j(,DSTabi{R^{"\ ce^f^^},Ciii} , (9) 
{DSTab, / 0) 
where i' is the upstream surrogate of i for flow m. 



3(m) 



(m) (m) 



)} 



{DSTah / (?!>) 



(10) 



Each surrogate i issues "Path Broadcast" messages, inform- 
ing its neighbouring surrogates of the bit rates of any flows it 
can offer, as shown in Fig. [12] "Rate" represents the current 
rate for flow labelled by "Flow ID". "MaxRate" represents the 
requested rate defined above. "Latency" represents the actual 
latency from the source surrogate to the current surrogate, i.e., 
^(m) "Y]y[ configuration" represents the VM instance type 
of the surrogate. Once receiving a "Path Broadcast" message 
for flow m from surrogate j, a surrogate i will only have it 
recorded into CUSTabi if the latency w,-'"^ is no larger than 
^■'"^ which can be estimated as uJj'^^'' +dji + (pj{a'j^\a\"^^). 
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Fig. 12. Gossip-like message communication. 

Lemma 1: For any adjacent surrogates i and j in a dis- 
semination tree T^™) e T^™)), Lp,{af'\a^p^) > 

'^i(''i'T^ "^ii"^)' where i' is the upstream surrogate of i for 
flow m. 

r> r 1 (™) ^ (™) (™) ^ (™) J 

Proof: we know c-,- < a] , clj < ctj and 
al'"' > aj™"* according to the definition. If al'"' > aj™"*, 

(fii (a'™^ , aj™' ) > <y9i (Cj-,™' , c J-™' ) since the transcoding latency 
(p{-, •) is monotonously increasing on both the input and output 
bit rates (Clarified in Sec. lIVIi; Otherwise, a'"*' = "^j™^' ™P" 
pose Cj-,™' 7^ Cj-™' (i.e., c-"'^ > c|™^ under a down-sampling 
only mechanism), we can derive Cij > a^™'' ~ ai™"* > 



„('") 



y - "J 

c^,j ' > c^V"^ It means the actual bit rate of flow m perceived 
by surrogate j is lower than both the requested bit rate and the 
remaining link bandwidth, which doesn't conform to Alg. |2] 

:i7),and^.(a("\af') = ^,(c|™\c|™') = 0. 
To sum up, m either case, (fi{al ',a) ') >(pi (q, ^ ' , q ■ ')■ 



^(™) 



So 4, 



bounds for impacted surrogates. 

Proof: Once the dissemination tree T^'"' is adjusted by 
redirecting surrogate i to a new upstream surrogate under 



the condition w 



(rn) 



< PI 



the potentially affected nodes 



can only be in the sub tree of T^™' rooted from i. Suppose 
the latency perceived by surrogate j in the sub tree violates 
the latency constraint, i.e., w > 



We could find 



a path from i to j. Assume the upstream surrogate of j 
is k, the corresponding latency is oj 



('") 



Vfe('^fc™ ' '^17 )' where k' is the upstream surrogate of k for 
flow TO. Since ij™'' > /?]'"■' according to the definition, 
Vfe(4'^\4™^) < v?fe(a^™\ a^™^) (guaranteed by Lemma [TJ, 



,(™) 



(m) 



we have, ' = w 

, / (m) ("'A \ a 



dkj - ^k{uJl ') > L 



, / (rn) (rn)s 

J J -^j -dkj-Mo^k '"i ) > ■ 
Similarly, along the path, we can easily derive ujI"^' > Pi' , 
which contradicts. ■ 
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